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1.  Introduction 


This  paper  is  concerned  with  a  key  question  in  the  use  of  recursive  Monte 
Carlo  methods  for  system  optimization,  when  the  system  operation  and  cost  are 
of  interest  for  a  long  period  of  time.  For  many  control  systems,  the  control  is 
given  a-priori  in  a  parametrized  form  and  for  the  use  of  Monte  Carlo  methods  for 
the  optimization  of  the  parameter,  one  needs  good  estimators  of  the  derivatives 
of  the  cost  function  with  respect  to  the  parameter. 

Reference  [1]  develops  a  very  useful  method  for  doing  this,  when  the  system 
is  of  the  diffusion  or  related  type,  and  the  control  interval  of  concern  is  finite. 
Numerical  approximations  to  the  unbiased  estimators  were  developed  and  an¬ 
alyzed,  and  simulations  showed  that  the  method  can  be  superior  to  competing 
methods  if  the  system  dimension  is  large  or  the  system  nonlinear.  In  this  paper, 
the  results  of  [1]  are  extended  to  the  ergodic  cost  problem.  New  difficulties  arise, 
since  we  need  essentially  to  deal  with  derivatives  of  the  invariant  measures  with 
respect  to  the  control  parameters  and  with  the  convergence  of  suitable  com¬ 
putable  approximations.  Owing  to  these  “ergodic”  problems,  the  assumptions 
are  stronger  here  than  in  [1], 

Let  x(-)  be  defined  by  the  diffusion 

(1.1)  dx  —  b(x,a)dt  -f  a(x)dw,  x£Rr, 

where  a(x )  =  cr(x)cr'(x)  is  non-degenerate  and  a  is  a  control  parameter  to  be 
chosen.  For  each  a  of  interest,  let  x()  have  a  unique  invariant  measure  p(a). 
Precise  conditions  will  be  given  below,  For  ‘smooth  cost  rate’  Jt(-).  define  the 
“ergodic  cost” 

(1.2)  (/j(a),*(a)}  =  J  fi(dx,a)k(x,a)  =  k(a). 
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We  wish  to  get  an  unbiased  estimator  of  dk(a)/da  (as  well  as  reasonable  ‘nu¬ 
merical’  approximations  from  sample  simulations)  at  selected  values  of  a.  Such 
estimators  are  necessary  if  we  wish  to  minimize  k(a)  over  a  by  some  recursive 
Monte  Carlo  (stochastic  approximation)  method. 

Control  problems  are  frequently  of  this  type;  i.e. ,  the  control  is  given  in  a 
parametric  form.  Often,  a  full  optimal  feedback  control  is  not  desired  since 
it  might  be  very  hard  to  implement  and  all  the  state  variables  are  not  avail¬ 
able  But  a  good  class  of  parametrized  controls  might  be  known.  See  [1]  for 
some  examples  and  further  motivation,  as  well  as  a  discussion  of  alternative 
approaches. 

Generally,  one  cannot  easily  evaluate  k(a)  or  its  derivatives.  Then  one  might 
seek  a  method  for  getting  good  estimators  which  can  be  used  in  a  recursive 
Monte  Carlo  optimization  method.  The  ease  of  getting  the  estimates  and  their 
quality  are  key  issues  in  such  an  approach.  The  estimators  are  to  be  obtained 
by  simulations  of  (1.1)  or  of  approximations  to  (1.1),  since  the  solution  of  (1.1) 
can  not  be  known  exactly. 

Reference  [1]  developed  a  general  “likelihood  ratio  derivative”  based  method 
for  getting  such  estimators,  under  conditions  which  are  much  broader  than  those 
used  in  this  paper,  but  for  a  ‘finite  time’  problem.  The  numerical  data  in  [1], 
and  that  obtained  subsequently,  show  that  the  method  can  be  quite  superior 
to  its  competitors  for  non-linear  and  high  dimensional  systems.  The  quality  of 
the  estimator  is  judged  by  the  “variance  per  CPU  time  required.”  The  reader  is 
referred  to  [1]  for  more  motivation  and  examples.  The  ergodic  cost  problem  is 
harder  and  requires  stronger  (hence,  the  non-degeneracy)  conditions.  Actually, 
the  method  has  been  successfully  tested  on  many  degenerate  problems  of  the 
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type  used  in  [1].  so  that  the  conditions  which  our  analysis  requires  can  undoubt¬ 
edly  be  weakened.  There  are  ready  extensions  to  the  jump-diffusion,  reflection 
and  other  standard  models.  In  order  to  introduce  the  idea,  we  give  a  brief  infor¬ 
mal  review  of  one  idea  in  [I],  but  using  our  slightly  different  terminology,  and 
under  stronger  conditions  than  used  in  [1]. 

For  given  T  <  oo,  define  the  “finite  time”  costs 

C(x,a)  =  f  k(x(s),  a)ds  +  k0(x(T),a), 

Jo 

C(x,a)  =  E£C(x,a), 

where  E%  denotes  the  expectation  with  parameter  a  and  x(0)  =  x.  We  always 
use  cio  to  denote  the  point  at  which  the  derivative  is  to  be  taken.  With  no  loss 
of  generality  a  will  be  a  real  number,  since  for  the  vector  case  we  can  estimate 
the  derivative  for  each  component  separately.  Let  P°(T)  denote  the  measure 
induced  by  the  solution  to  (1.1)  with  the  initial  condition  x(0)  =  x,  on  Cr[0,7’], 
the  space  of  /T-valued  continuous  functions  on  [0,7’’],  with  the  sup  norm.  Let 
b(x ,  a) ,  k(x ,  a)  and  ko(x,a)  be  a-differentiable  and  define  a  =  o0  +  ba  and 
6fc(x,ao,£a)  =  b(x,a0  +  6a)  -  b(x,a0).  Define 

^(0,T;«o,  ^o)  =  /  [<t-1(x(s))^6(x(s),  ao.  <5a)]'du'(s) 

Jo 

T 

J  |<r-1(x(s))66(x(s),a0,6a)|2<fs, 

and  the  Radon-Nikodym  derivative 

dPa°+*a(T) 

(1-3)  dP^T)  =  exp{(0,r;a0,6a). 

Define  Z(  )  by 

(1.4)  Z{T,a0)  =  fT[<T-\x(s))ba(x(s),a0)}'dw(s) 

Jo 
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[^o(x(s)>Qo)a"’1(z(s))][^'r(s)  “  b(x(s),a0)ds]. 

We  use  the  subscripted  6a(a;,ao),  etc.,  to  denote  the  ar-derivatives  at  «o-  Then 
the  quantities 

(1.5)  Q(a0)  =  f  [k(x(s),a0)Z(sta0)  +  ka{x(s),a0)]ds 

Jo 

+  ko(x(T),a0)Z(T,  a0)  +  k0:a(x(T),ot0), 

rT 

(1.5')  Q(a0)  =  [(k(r(s),a0)  -  k(x(s),a0))Z(s,a0)  +  ka(x{s)iQ0)}ds 

Jo 

+  (io(x(T),a0)  -  kQ(x(T),ao))Z(T,ao)  +  k0  a(x(T),  »o), 
where  we  use 

k(x(s),a0)  =  E^°k(x(s),a0), 

are  unbiased  estimators  of  Ca(x,ao).  Thus,  if  a  path  of  *(•)  is  available,  one 
can  calculate  or  approximate  (1.5)  or  ( 1 .5') . 

In  order  to  avoid  the  very  time  consuming  task  of  evaluating  (from  the 
simulations)  k(x(s),Qo)  for  each  s  <T,  in  (1-5  ),  we  usually  use  <r(x(T’),oo)  in 
place  of  k(x(s),a o),  and  with  good  results. 

Generally,  paths  of  the  true  model  z(  )  are  not  available,  and  one  can  only 
approximate  via  a  numerical  method  (say,  a  discrete  time  approximation).  Ref¬ 
erence  [1]  discusses  two  basic  classes  of  such  approximations  and  proves  that 
the  estimators  obtained  from  them  are  good.  Getting  good  estimators  is  more 
difficult  for  the  ergodic  problem,  since  we  also  need  to  truncate  the  infinite  time 
interval  and  approximate  (at  least  implicitly)  derivatives  of  invariant  measures, 
a  non-trivial  problem. 
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The  proofs  use  a  representation  of  the  invariant  measure  of  the  diffusion 
process  in  terms  of  that  of  an  imbedded  Markov  chain,  defined  by  the  random 
return  times  to  a  “recurrence  set”,  as  well  as  certain  Girsanov  transformations 
defined  on  these  “return  intervals” .  In  order  to  be  sure  that  these  transforma¬ 
tions  are  well  defined,  a  bound  on  an  exponential  moment  of  the  return  time  is 
needed.  This  is  provided  by  the  stability  result  in  Section  2.  Section  3  is  con¬ 
cerned  with  ergodic  properties  of  the  diffusion  model.  The  imbedded  Morkov 
chain  is  defined,  and  the  invariant  measure  of  the  diffusion  is  defined  in  terms  of 
this  Markov  chain,  and  the  needed  recurrence  (^-recurrence)  properties  of  the 
chains  are  stated.  Section  4  is  concerned  with  the  existence  of  the  derivative 
of  the  invariant  measure  of  the  diffusion  with  respect  to  the  parameter.  The 
differentiability  is  first  shown  for  the  invariant  measure  of  the  imbedded  chain, 
and  then  this  is  used  to  get  the  result  for  the  diffusion.  The  differentiability 
is  in  two  senses,  setwise  convergence  and  weak  convergence.  Some  preliminary 
results  concerning  equicontinuity  of  certain  sets  of  functions  and  invertability  of 
the  operator  /  —  F(ao)  (defined  in  the  section)  are  first  proved.  It  is  also  shown 
that  the  derivative  of  the  invariant  measure  can  be  well  approximated  by  the 
derivative  of  the  transition  function  for  larg"  enough  time. 

Since  the  diffusion  model  is  an  “ideal”  model  and  the  paths  can  at  best  be 
approximated  in  some  statistical  sense,  one  needs  to  know  that  the  natural  ap¬ 
proximations  can  be  used  with  confidence  in  any  implementation.  Reference  [1] 
dealt  with  two  types  of  approximations,  a  discrete  time  model  and  a  Markov 
chain  approximation.  Either  can  be  used  here,  but  we  restrict  our  attention 
to  the  first  approximation.  The  model  is  introduced  in  Section  5,  and  some 
preliminary  sensitivity  results  are  stated  there.  Some  needed  stability  estimates 
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(analogous  to  the  estimates  of  Section  2),  uniform  in  the  approximation  param¬ 
eter,  are  obtained  in  Section  6.  The  main  theoretical  results  for  the  approxima¬ 
tions  are  in  Section  7,  where,  after  getting  some  preliminary  results  concerning 
the  rate  of  convergence  of  certain  quantities  to  their  “invariant  means”,  it  is 
shown  that  the  invariant  measure  of  the  discrete  time  approximation  is  differ¬ 
entiable  with  respect  to  the  control  parameter,  that  the  derivatives  converge  to 
the  derivative  of  the  invariant  measure  of  the  diffusion,  as  well  as  results  con¬ 
cerning  finite  time  approximat  .ons.  The  results  imply  an  important  robustness 
of  the  derivatives  with  respect  to  the  model.  This  is  a  new  result  and  a  very 
useful  one  from  the  point  of  view  of  applications,  since  otherwise  general  results 
concerning  the  existence  of  the  derivatives  for  the  ideal  model  would  not  have 
much  practical  relevance. 

Numerical  data  is  given  in  Section  8.  The  basic  method  of  implementation 
requires  the  use  of  a  discrete  parameter  approximation,  over  a  finite  time  period. 
The  period  needs  to  be  large  enough  to  capture  the  “ergodic  effects”.  Two 
methods  are  compared;  a  finite  difference  method,  wrhich  has  been  altered  to  be 
fairly  efficient,  and  several  forms  of  our  method.  The  comparison  depends  on  the 
problem,  but  it  is  clear  that  for  a  large  class  of  nonlinear  problems,  our  method 
is  preferable.  One  should  note  that  reasonable  examples  can  be  constructed  so 
that  any  chosen  method  works  best,  so  that  one  needs  to  keep  an  open  mind  in 
any  application. 

The  analysis  has  been  restricted  to  nondegenerate  diffusion  models,  but  a 
similar  analysis  can  be  carried  out  with  various  related  process,  provided  only 
that  ergodic  results  analogous  to  those  of  Section  3  are  available. 
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2.  Stability  of  x(-) 


In  order  to  develop  the  ergodic  results  and  use  a  Girsanov  measure  transfor¬ 
mat'  n  method  on  random  unbounded  intervals,  suitable  stability  properties  of 
x(-)  need  to  be  proved.  We  will  use  the  following  assumptions.  The  parameter 
a  will  be  confined  to  a  compact  interval  Aq  with  a0  in  its  interior. 

A2.1.  b(  )  and  <r(  )  are  continuous,  <r(  )  is  bounded  and  cr(x)</(x)  =  a(x)  > 
£qI  for  some  £0  >  0.  For  some  K  <  oo,  jfc(x,a)|  <  K\x\  +  K . 

A2.2.  (l.l)  has  a  unique  weak  sense  solution  for  each  x(0)  =  x  and  a  £  Ao- 

A2.3.  There  is  a  twice  continuously  differentiable  Liapunov  function  0  < 
l’(z)  — >  oc  as  |x|  — *  oc  and  £\  >  0  such  that 

(a)  l’xr(x)  is  bounded  and  continuous, 

(b)  Vf(x)b(x,a)  <  -f!  <  0  for  large  |x|,  o  6  Ao, 

(c)  bm|x|— oo  sup  \Vz{x)\7 /\Vf{x)b{x,a)\  <  oo, 

a€A  o 

(d)  lim(l.Hoo  sup  \Vxr(x)  ■  a(x)|/|V^(z)6(z,o)|  <  2. 

o£Ao 

A2.4.  When  b(x,a)  =  0,  (1.1)  has  a  unique  weak  sense  solution  for  each 
x  =  *(0). 

A2.5.  There  is  a  bounded  continuous  function  6q(-,qo)  such  that  as  6a  —  0 
6b{x,ao,  6q)/6q  — *  ba(x,Q0) 
boundedly,  and  uniformly  on  each  compact  x-set. 

Remark  on  (A2.3).  The  condition  does  not  seem  to  be  very  restrictive.  It 
holds,  in  particular,  for  the  linear  case  6(x,a)  =  A(a)x,  where  A(a)  is  ‘uniformly 
stable’  for  a  £  Ao- 
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Remark  on  (A2.2).  (A2.4)  and  the  stability  Theorem  2.1  imply  (A2.2), 
but  it  is  useful  to  isolate  it  as  a  separate  condition. 

Theorem  2.1.  Assume  (A2.1)-(A2.3).  There  is  a  compact  set  Q  which  is 
the  closure  of  its  interior  such  that  for  each  compact  Qi  D  Q  and  t\  defined  by 
T\  =  min{t:x(l)  6  Q},  we  have  for  small  p  >  0 

(2.1)  sup  sup  E^exppTi  <  oo. 

16Q1-Q 

Proof.  Let  C  denote  the  differential  generator  of x(  ):  Cf(x)  =  fx(x)b(x,a)+ 
|trac e  fIT(r)  a(z).  Then 

CepV^  =  pepvlx)[Vf(x)b(x,a) 

+  ptrace(VT(x)V'r'(x))  ■  a(x)f 2  +  trace  VZI(x)  ■  a(z)/2] . 

Let  Q  be  large  enough  and  p  small  enough  such  that  for  x  £  Q  (use  (A2.3))  and 
some  X  >  0, 

(2.2)  Ce <  -PXepv(x\ 

It  then  follows  that  for  small  p  and  x  £  Q 

(2.3)  £[eA'V‘,(r)]  <  0. 

From  (2.3),  Ito’s  Lemma  and  a  stopping  time  argument  it  follows  that 

(2.4)  £“eApT|  <  E«eApr‘epV'(r(T,»  <  epV(~x) 

for  small  p  and  x  =  r(0)  £  Q ,  which  yields  the  result  .  Q.E.D. 

Corollary.  Assume  (A2.1)-(A2.3).  Let  Q  and  Q\  be  as  in  the  theorem. 
Define  r  to  be  the  first  return  time  of  x(  )  to  Q  after  kitting  dQ j.  Then,  for 
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small  p  >  0 


(2.5) 


sup  sup  E^epr  <  oo. 
a£Ao t€ dQ 


The  proof  follows  from  the  theorem  and  the  non-degeneracy  and  is  omitted. 
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3.  Ergodic  Properties  of  (1.1) 


By  (A2.1)-(A2.3)  and  Theorem  2.1,  for  each  a  €  Ao,  x(-)  is  a  recurrent 
strong  Feller  process.  Let  P(x,t,A  \  a)  denote  the  transition  function.  By  [2], 
[3],  there  is  a  unique  invariant  measure  /((a)  with  /r(f?r,a)  and 
P(x,t,A  |  a)  -1+  n{A,a)  as  t  — *  oo,  for  each  Borel  A.  For  t  >  0,  P(x,t,-  |  a) 
has  a  bounded  and  nowhere  zero  density  with  respect  to  Lebesgue  measure  and 
so  does  //(a). 

We  next  state  a  representation  of  fi(a)  first  used  by  Khazminskii  [2]  and 
which  is  very  useful  for  analysis.  The  representation  is  useful  largely  because 
it  is  hard  to  work  with  ergodic  problems  and  to  deal  with  questions  concerning 
convergence  to  invariant  measures  when  the  state  space  is  unbounded. 

Let  Gi  D  G  be  compact  sets,  each  of  which  is  connected  and  is  the  closure 
of  its  interior.  Denote  the  boundaries  by  Ti  and  F,  resp.,  and  let  G  be  strictly 
interior  to  G\.  Let  T  and  Ti  be  differentiable.  Define  the  stopping  times: 

r'  =  inf{t:x(<)  €  T]} 
rx  =  inf{t:x(<)  €  T}, 
r[  -  inf {<  >  Tj : x(t)  €  Tj}. 

For  n  >  1, 

rn  =  inf {<  >  r'n_l:  x(t)  €  T}, 
r'  =  inf{<  >  rn:x(t)  e  Ti}. 

For  x  =  x(0)  €  T,  we  use  r  to  denote  r2  —  tx  =  r2,  the  canonical  “return”  time 
to  r. 
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By  Theorem  2.1,  for  small  p  >  0, 

(3.1)  sup  E°t  <  oo,  sup  E°epT  <  oc. 

rer,a€/lo  *er,a€i4o 

Let  a  £  Ao .  Define  the  process  X„  =  z(r„).  By  [2]  and  (A2.1)-(A2.3),  {A„} 
is  a  recurrent  homogeneous  Markov  chain  on  I\  Let  P(x,n ,  •  |  o)  denote  its 
transition  probability.  It  has  a  unique  invariant  measure  p (a). 

The  chain  is  also  defined  for  initial  condition  x  =  Xo  G  G.  Even  though 
Xn  6  T,  for  n  >  1,  it  will  be  useful  to  use  G  as  the  state  space  in  Section  6  and 
afterwards  in  order  to  unify  the  notation  with  that  for  the  approximations.  The 
results  up  to  Section  5  hold  with  this  change. 

Define  r(A)  =  Ia(z(s))cIs  for  Borel  sets  A.  Then  we  can  write  [2,3] 

(3.2)  p(A,  a)  =  ji(A,  a)/p{R\a), 


where 


HA, a)  =  J^p(dx,a)E°r(A). 

Hence,  for  bounded  measurable  /(■),  we  have  the  representation 


(3.3) 


tn(n\  f\  -  Jr Hdz,a)E?  fj  f(z(s))ds 
frt(dx,a)ESr 


Equation  (3.3)  and  various  approximations  to  it  will  be  widely  used  in  the  sequel. 


Properties  of  {An}.  The  chain  {A’n}  on  state  space  T  is  said  to  be  uni¬ 
formly  ^-recurrent  (for  a  given  measure  <f>  on  the  Borel  sets  of  T)  if  for  each 
Borel  B  £  T  with  d(B)  >  0 


P"{A'i  £  B ,  some  i  <  m)  — >  1  as  m  — *  oc; 


uniformly  in  x  £  T.  A  sufficient  condition  [4,  p.  29]  is  that  if  d(B)  >  0,  3 n  <  oc, 
t  >  0  (which  can  be  B-dependent)  such  that 


(3.4) 


Pr  {X, ;  €  B,  some  i  <  n)  >  s,  all  x  £T. 


If  the  chain  is  ^-recurrent  and  a-periodic  then  3 C  <  oo,  7  <  1  such  that  for 
Borel  sets  B 

(3.5)  \P?{XneB}-ji(B,Q)\<Cr, 

and  for  bounded  measurable  /(•), 

(3.6)  1  £“/(*„)  -  h  <  2c7nn/  -  rn, 

where  ||/||  =  sup  |/(x)|  and  f°  -  (/2(a),  /). 

The  next  theorem  follows  from  [3,  p.  339,  proof  of  Theorem  5.1  there].  The 
model  in  the  reference  does  not  explicitly  include  a  parameter  a,  but  it  is  easily 
seen  from  the  proof  of  the  cited  theorem  that  the  non-degeneracy  and  the  fact 
that  the  moment  bounds  in  Theorem  2.1  do  not  depend  on  a  €  A0  implies  that 
(3.4)  is  uniform  in  a  €  Ao  for  some  e  >  0.  In  fact,  we  can  use  n  =  1.  Actually, 
we  will  only  need  the  result  for  a  =  ao- 

Theorem  3.1.  Assume  (A2.1)-(A2.3).  {An}  is  <j> -recurrent,  where  (j>  is 
Lehesgue  measure  on  I\  The  recurrence  is  uniform  in  a  6  A0  in  the  sense  that 
the  mean  recurrence  times  are  bounded  uniformly  for  a  6  Ao-  There  are  C  <  00, 
7  <  1  ( not  depending  on  a  €  Ao)  such  that  (3.5)  and  (3.6)  hold- 

It  will  be  seen  below  (Lemma  4.1)  that  P{x,n,B  j  a)  is  continuous  in  x, 
uniformly  in  a,B.  (The  continuity  is  proved  in  the  above  reference  [3],  but  we 
give  a  different  proof  since  the  details  to  be  used  will  be  needed  elsewhere  in 
the  paper.) 
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4.  The  o-Derivative  of  fl(a)  (Setwise  sense) 

Let  C(r)  denote  the  set  of  bounded  and  continuous  functions  on  T,  and 
Cc( r)  the  centered  functions:  fx  £  Ce(T)  if  /i  =  /  -  /  for  /  £  C(r),  where 
/  =<  ft(ao),f  >.  In  order  to  prove  the  differentiability  of  fi(a)  at  Qo,  we  first 
prove  that  of  fi(ct),  and  then  use  (3.3). 

Definition.  /1(a)  is  said  to  be  differentiable  at  ao  in  the  setwise  (or  weak) 
sense  if  there  is  a  finite  signed  measure  v  such  that  for  each  Borel  set  B 

v(B)  =  lim  [fi(B, a0  +  6a)  -  fi(B ,a0)]/6a. 

6a— 0 

/i(a)  is  said  to  be  differentiable  at  ao  in  the  sense  of  weak  convergence  (or  weak * 
sense)  if  there  is  a  finite  signed  measure  t;  such  that  for  each  /  £  C(r), 

(v,  f)  =  lim (ft(a0  +  6a)  -  /i(a0),  /)/6a. 

Oct—*  0 

Definition.  Let  L°°(r)  denote  the  bounded  Borel  measurable  functions  on 
T.  For  any  Borel  set  H,  let  B(H)  denote  the  Borel  subsets  of  H.  Define  the 
operator  P(a)  on  L°°(r)  by  P(a)f(x)  =  E%f(X\). 

Lemma  4.1.  Assume  (A2.1)-(A2.4).  Then  the  set  {P(a)L°°(T),  a  €  A) 
( restricted  to  functions  with  ||/||  <  1)  is  equicontinuous. 

Proof.  Define  the  process  y(-)  by  y(0)  =  x  and 
(4.1)  dy  =  a(y)div. 

Define 

«(.,.)  =  J  k~1(y(s))fr(y(s),o)],rfw'(s)  J  W~l(y(s))b(y(s),Q)\2ds. 
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Given  e  >  0,  there  are  T2  >  7)  >  To  >  0  such  that  for  all  a  6  Ao  and  x  €  I\ 

(4.2)  P?{t>T2}<£,  P£ {t  <  Ti)  <  £ 

(4.3)  E£  exp  (T0 ,  7) )  =  1 . 

(4.4)  sup  £“  exp  2 fo  (0,7))  <  Kf  <  00, 

*€r,a€Ao 

(4.5)  £1/2|exp^o(0,To)  —  1|2  <  £. 

Let  r12  =  (r  A  7))  V  7).  By  (4.2),  we  have 

|£“/(A'1)-^/(x(r12)|<4£||/||. 

Write 

E?f(x(r 12))  =  E«£“(Ti)/(x(r12))  =  E°/i  (*(?))), 

where  /1  is  defined  in  the  obvious  way  and  ||/)||  <  ||/||.  By  use  of  a  Girsanov 
measure  transformation,  (4.4),  (4.5)  and  Schwarz’s  inequality,  we  can  write 

ESMxOi))  =  £?/i(y(7’i))exptf  (0,7)) 

=  E%  Efoe)h  (y(T, ))  exp  (To ,  7) )  +  t' 

=  Exh(y(Po))  +e', 

where  ||e'||  <  eKi||/||,  /2  is  defined  in  the  obvious  way  and  ||/2||  <  ||/||.  Note 
that  /2  depends  on  a  but  y(To)  does  not. 

By  the  above  estimates  and  arbitrariness  of  c,  we  need  only  show  the  equicon- 
tinuity  of  the  set  {E£/2(y(T0)):  ||/2||  <  1,  o  €  A0,  /2  €  L°°(r)}.  Since  y(T0) 
has  a  bounded  density  with  respect  to  Lebesgue  measure,  using  characteristic 
functions,  we  can  write 

EZMy(To))  =  j£yj  f2(y)dy{j  J  (exp  —iu'y)E°  exp  tVy(7o)</i/|. 


14 


We  have 


I K  exP  M*'y(r0)|  <  exp  -0(|u|2), 

where  0(  )  can  be  chosen  independently  of  r  £  T  and  a  G  Aq.  Also,  the 
bracketed  term  is  the  density  (modulo  a  proportionality  factor  (p^))  of  y{ T0 ) 
and  is  bounded  by  exp— 0(|y|2),  where  0(  )  can  be  chosen  independently  of 
x  G  T  and  a  G  Ao-  Thus,  we  need  only  prove  that 

E°  expiu'y(T0) 

is  ^-continuous  on  each  bounded  u-set.  But  this  follows  from  the  Feller  property 
of  y().  QE.D. 

Corollary.  Assume  (A2.1)-(A2.4).  Then  the  transition  function 
P(x,n,B  |  a)  =  J?"7b(A'„)  is  continuous  in  x,  uniformly  in  B,  n  and  a  G  A0. 
Also  /j(B,a0  +  (5a)  — *  ji(B,a0),  uniformly  in  B  G  5(r). 

Proof.  The  first  assertion  is  a  direct  consequence  of  the  lemma.  Let  g  G 
L°°(r),  Hffll  <  1.  Then,  by  the  invariance  of  y(a), 

{fi  (a0  +  6a),s)  =  J  ft(dx,ao  +  6a)E%0+lag(X\). 

A  measure  transformation  argument  and  the  continuity  of  6(  )  can  be  used  to 
show  that,  as  ba  — *  0,  E£0+6ag(Xi)  converges  to  E°°g(Xi),  uniformly  in  x  G  T. 
The  latter  function  is  continuous  on  T  by  the  lemma.  In  fact  the  continuity  and 
the  convergence  is  uniform  in  g.  From  this,  the  invariance  of  y{ao)  and  the 
weak  convergence  p(a0  +  <5a)  =>  p(ao)  (see  Lemma  4.3  below),  we  have 

lim (/2(a0  +  6a),  g)  =  f  y(dx,a0)E°og{Xi) 

50  — 0  J 

=  /  fi(dxta0)g{x), 
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where  the  convergence  is  uniform  in  g:  ||p|j  <  1.  Q.E.D. 

The  next  lemma  will  be  used  to  get  the  differentiability  of  /i(a)  at  a0  from 
that  of  fi(a),  via  (3.3). 

Lemma  4.2.  Assume  (A2.1)-(A2.5).  Then  for  f  6  L°°(r),  as  6a  — ♦  0 
[P(*0  +  fa)  -  P{a0)]f/6a 

converges  ( uniformly  in  x)  to  the  function  with  values  E£°  f(Xi)Z(r,  ao)-  The 
limit  is  continuous  and  the  convergence  is  uniform  for  f  :  ||/||  <  1.  The  set 

{E?Z(r,a o)/(A'1),||/||  <  1,/  €  L°°(r),a  €  A0} 


is  equicontinuous.  The  same  result  holds  for  the  convergence 


_L  [££«+*«  /(*(,))*  -  E?  jf  f(x(s))ds 

-*  E°°  (  f(x(s))ds  Z(r,ao)  —  E°°  f  f(x(s))Z(s,a0)ds. 

Jo  Jo 

Proof.  The  proof  of  the  last  assertion  is  very  similar  to  that  of  the  prior 
assertions  and  will  be  omitted.  By  an  argument  analogous  to  that  of  Lemma 
4.1,  we  can  prove  the  equicontinuity  of  the  cited  set  of  functions.  We  will  prove 
only  the  first  assertion  of  the  lemma.  For  T  <  oo,  via  a  Girsanov  measure 
transformation, 


E°o+i°f(x(r  A  T))  -  E?°f(z(r  A  T)) 
6a 


Ez°  f(x(Th.T))[exp£(0,  T\a0,  6a)  —  \]/6a 


(4.6) 


=  Ef°f{x{r  A  T))[exp  £(0,  r  A  T\  oq,  6a)  -  1  \/6a. 


We  have,  by  (A2.5)  and  Theorem  2.1, 

[exp^(0,  r  AT;ao,6a)  -  1 


lim  lim  E“° 

6a— OT —oo 


i2 


6a 


-  Z(r,a 0) 


=  0, 
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where  the  limit  is  attained  uniformly  in  x  €  T.  The  first  assertion  of  the  lemma 
follows  from  this  and  (4.6).  Q.E.D. 

The  next  corollary  shows  that  the  setwise  derivative  of  fi(a)  at  ao  is  abso¬ 
lutely  continuous  with  respect  to  fi(ao)- 

Corollary.  Assume  (A2.1)-(A2.4).  Define  ike  sei  function  v  by 
v(B)  =  lim  —  f  fi(dx,a0)[P(x,l,B  \  a0  +  6a)- P(x,l,B  |  o0)]- 

<o-*0  ca  Jr 

Then  there  is  G  G  L1(p(oo))  such  that  (ji(ao),G)  =  0  and 

v{B)=  f  fi(dx,a0)G(x). 

Jb 

The  limit  is  uniform  in  B. 

Proof.  By  the  lemma,  the  limit  is 

J^(dx1Q0)E^Z(r,a0)IB(X1), 

and  the  limit  is  taken  on  uniformly  in  B.  (In  fact  E°°Z(t,q0)Ib(Xi)  is  con¬ 
tinuous,  uniformly  in  B.)  Both  fi(ao)  and  the  measure  defined  by  the  limit 
are  mutually  absolutely  continuous  with  respect  to  Lebesgue  measure,  since 
the  transition  probability  P(x,  1,-  |  q0)  is.  Let  G  denote  the  Radon-Nikodym 
derivative  of  v  with  respect  to  p(oo)-  Since  E°° Z{t,qo)Ir’{X\)  —  0,  we  have 
(£(<*„),  G)  =  0.  Q.E.D. 

Lemma  4.3.  Assume  (A2.1)-(A2.4).  Then  fi(cto  +  Sa)  =>  ft(a o). 

Proof.  The  proof  follows  from  the  uniqueness  of  ji(ao)  and  the  convergence 
P(x,  1,  B  )  qq  4-  ha)  — *  P(x,  \,B  j  oo),  uniformly  for  x  6  T  (Lemma  4.1),  and 
the  details  are  omitted. 
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Definition.  Let  L£°(r)  C  L°°(r)  be  the  ‘centered’  subset  for  which  (fi(ao),  f)  = 
0.  We  identify  functions  in  L“(T)  which  are  equal  a.e.  (Lebesgue  measure). 

The  following  lemma  is  a  key  result  for  proving  the  differentiability  of  fi(a) 
at  ao-  The  representations  used  occur  throughout  the  sequel. 

Lemma  4.4.  Assume  (A2.1)-(A2.4).  Then  (/  -  F(q0)):  Lf(r)  —  Lf(  r) 
is  invertible. 

Proof.  The  fact  that  P(a o)  maps  L£°(r)  into  Z,£°(r)  follows  from  the  fact 
that  /i(a0)  is  an  invariant  measure  for  the  transition  function  P(x,  n,  •  (  a0)-  We 
prove  the  invertability  by  simply  exhibiting  the  inverse.  Let  /  6  L£°(r).  Then 
it  is  easily  seen  from  (3.6)  and  the  definition  of  (/  —  P(a0))  that  the  “inverse” 
defined  by 

(4.7)  (I  -  P(a0))-'f(x)  =  f)P"(oo)/(*)  =  £  E?°f(Xn) 

n= 0  n=0 

satisfies  our  needs.  Q.E.D. 

Corollary.  Assume  (A2.1)-(A2.4).  Then  (I  -  P(a0)):  Cc(r)  — *  Cc(T)  is 
invertable. 

Proof.  By  Lemma  4.1,  P(ao)Cc(T)  C  Cc(r).  The  rest  of  the  proof  is  as  for 
the  lemma.  Q.E.D. 

Theorem  4.1.  Assume  (A2.1)-(A2.5).  Then  fia{o o)  exists  in  the  sense  of 
setwise  convergence  and  satisfies,  for  f  €  L°°(r), 

(4.8)  (fia(oo)J)  =  (fi(Qo),P:(ao)f)  +  {fia(ao),Pn(ao)f), 
where 

p.”(«o  )/<«)  =  i-  ej/(x.)L- 
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Proof.  For  /  £  L~(f),  we  have 


(/»(<*)  “  P(<*o),  f)  =  (p(»),  P(a)f)  ~  (Moo),  P(oo)f) 


(4.9)  =  (Mo)-Moo),P(o0)/)  +  (/i(oo),(P(o)-P(o0))/) 


+  (Mo)-Moo),(P(o)-P(q0))/). 


Write  6ji(a)  =  fi(a)  -  ji(oo)  and  6P(a)  =  P(a)  -  P(ao).  Then,  (4.9)  yields 

(4.10)  ma)/6a,(I  -  P(a0))f)  =  (/i(a0),  6~~f)  +  (^(o), 

By  Lemma  4.2  and  either  Lemma  4.3  or  the  Corollary  to  Lemma  4.1,  the 
second  right-hand  term  in  (4.10)  goes  to  zero  as  6a  — ►  0  (uniformly  in  f  .  ||/||  < 


!)• 

For  g  £  L2°(r),  define  (use  Lemma  4.4),  /  =  (/  -  P(a0))~1g.  By  Lemmas 
4.2  and  4.4 

~^(i  -P(a0)r'g 

converges  (uniformly  in  x)  to  the  function  with  values 


E«°f(X,)Z^a o)  =  E?°[Z(t, q0)  ]TPy°MA'n) 

n=0 


]  =  ?(*)• 

y=x, 


which  is  in  C,;(r).  Hence 


(4.11)  .lim  (6fi(a)/6Q,g}  =  (Ma0),£). 

0ar-*O 

Since  g  £  L£°(r),  and  L£°(r)  =  L°°(r)  modulo  constant  functions,  (4.11)  gives 
the  desired  setwise  convergence. 

The  formula  (4.8)  follows  in  a  similar  way.  Q.E.D. 

Corollary.  Assume  (A2.1)-(A2.5).  Then  fia( q0)  exists  in  the  sense  of  weak 
convergence. 
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Remark.  The  corollary  is  obviously  a  special  case  of  the  theorem.  But, 
it  can  be  proved  directly  via  the  method  of  proof  of  the  theorem,  simply  by 
replacing  all  T)  by  Cc(r).  This  remark  will  be  useful  when  working  with 
the  approximations  in  Section  7,  since  there  we  will  have  to  work  with  weak 
convergence  only. 

Now  that  the  existence  of  fia(ao)  is  established,  we  can  turn  our  attention 
to  Ma(»o)- 


Theorem  4.2.  Assume  (A2.1)-(A2.5).  Then  pa(ao)  exists  tn  the  sense  of 
seimse  convergence,  and  for  f  € 


W«o),/) 


(Mao),/) 

(p(Pr,a0))2 


^  J^fi{dx,ao)Ef°  J  f(x{s))Z(s,a0)ds 
Jr  fia(dx,  ao)E°°  j  /(r(s))dsj 

\.Jr  a°^Er°  J  Z(s,aQ)ds  +  J  po{dx,a0)Eg r 


MR' 
+ 


f4  19)  _  d  f  Mdx'Q)E?  fJ  f(*(s))ds' 

da  f  ft(dx,a)E?T 

Also  fia(o o)  is  absolutely  continuous  with  respect  to  Lebesgue  measure  and  has 
finite  variation. 


Proof.  Let  /  €  L°°(/?r)-  Define  6MQ)  =  MQo  +  <$o)  —  p(oo)  and  de¬ 
fine  6fi(a)  analogously.  Define  the  operator  P(»)  on  Lco(Rr)  by  P{a)f  = 
Ex  /.  /(x(«) )<fs.  Let  e  denote  the  function  which  is  identically  unity.  We  need 
to  show  the  differentiability  of 


(/i(Q),P(Q)/)/{p(Q),P(Q)e)  =  (p(Q),/). 


It  will  be  sufficient  to  show  the  differentiability  of  the  numerator  only.  This  will 
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be  the  first  bracketed  term  in  (4.12).  We  can  write 

~  l(p<c0  +  So),  P(o  1  +  So)/)  -  (fj(ao),  P(ao if)] 

=  <^1.  f  <„.)/>  +  (Moo),  (e(..-M°)-n°.))/) 

+  (««<*.). — —fl- 

By  Lemma  4.2,  the  second  term  on  the  right  converges  to  the  first  term  in 
the  first  bracket  on  the  right-hand  side  of  (4.12).  The  first  term  on  the  right 
converges  to  the  second  term  in  the  first  bracket  on  the  right-hand  side  of  (4.12) 
by  Theorem  4.1  and  the  fact  that  P(ato)f  G  C(T).  Similarly,  the  last  term  on 
the  right  goes  to  zero.  The  representation  (4.12)  implies  the  absolute  continuity 
assertion  since  it  equals  zero  if  /  =  0  a.e.  (Lebesgue  measure).  It  also  implies 
the  finite  variation.  Q.E.D. 


Theorem  4.3  essentially  says  that  the  a-derivative  of  E° f{x(t))  equals  that 
of  (/y(a),/)  for  large  t. 

Theorem  4.3.  Assume  (A2.1)-(A2.5).  Then  for  f  G  L°°(Rr), 

,!i 25/ **•«>  e; /wo)  = 

J  Q  =  Q0 

(413)  =  Urn  J n{dx,a0){E2°f{x{t)))o, 

^  / ''(*’o)£,“7  l  |Q=Q0 

(4.14)  =  Hm  J n(dx,ao)(E?°f(x(t)))a, 

and  the  limits  exist. 
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Proof.  By  the  differentiability  proved  in  Theorem  4.2  we  can  write 

£  Jrtdl.c,)/ (*)=  gj  j 

(4.15)  =  /Va(dl><*o)£?0/(*(1))  +  J  V(dx,aQ){E°°S(x{t)))a. 

As  t  — *  oo,  E°°f(x(t))  — >  (fi(ato),f)  for  /i(ao)-almost  all  x.  Since  /iQ(ao) 
is  absolutely  continuous  with  respect  to  Lebesgue  measure  (Theorem  4.2),  and 
/i(ao)  and  Lebesgue  measure  are  mutually  absolutely  continuous,  we  have  that 
fia(ctc)  is  absolutely  continuous  with  respect  to  fj(o 0).  Also  (fta(a o),  constant 
function)  =  0.  These  facts  imply  that  the  first  term  on  the  right-hand  side  of 
(4.15)  goes  to  zero  as  t  — ►  oo,  which  yields  the  assertion  concerning  (4.13).  The 
expression  (4.14)  is  proved  in  the  same  way.  Q.E.D. 
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5.  A  Discrete  Time  Approximation 


Since  the  paths  of  x(-)  and  iu(-)  are  not  physically  available,  we  cannot  eval¬ 
uate  (1.5)  or  use  Theorem  4.2  or  4.3  as  stated  to  get  estimates  of  the  derivatives 
<  /i(a0)i  /  >o  via  the  use  of  paths  of  x(-)  or  w(  ).  We  need  to  work  with  com¬ 
putable  approximations  to  x()  and  w(  ).  In  [1],  two  types  of  approximations 
were  used  for  the  finite  time  problem:  the  first  was  a  discrete  time  approxi¬ 
mation,  and  the  second  a  Markov  chain  approximation.  Each  one  has  its  own 
advantages,  but  simulation  studies  indicate  that  their  overall  numerical  prop¬ 
erties  are  similar.  We  will  work  with  the  discrete  time  approximation  here.  In 
this  section,  the  approximation  is  defined.  Some  necessary  stability  results  are 
proved  in  the  next  section.  Among  other  things  to  be  shown,  the  robustness 
properties  of  approximations  to  derivatives  of  invariant  measures  and  ergodic 
costs  will  be  clear. 

For  A  >  0  and  Su'(n A)  =  w(nA  +  A)  -  tr(nA),  define  {A'a}  by  AA  =  x 
and 

(5.1)  AA+1  =  A*  +  A6(A*,a0)  +  <t(X£)6w(ti  A). 

Define  the  interpolation  xfl(-)  to  be  the  piecewise  constant  (on  intervals 
[nA,nA  + A))  process  with  xA(nA)  =  AA .  Define  ZA(-,  e»o)  to  be  the  piecewise 
constant  (on  intervals  [nA,nA  +  A))  process  with  value  at  nA: 
n-1 

ZA(nA,a0)  =  ^[^(A^aoJMA^aojrMiA) 

•  =  0 

=  J2K(X?><*o)o-1(XZ )]  [<5A,A  -  A6(A,a  ao)] , 

i  =  0 
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where  6X A  =  X^  -  XA.  For  T  =  N/ A,  [1,  Section  4]  shows  that 
N- 1 

<?A(<* o)  =  £  A[*(^«0)Z^nA,a0)  +  M*n,«o)] 

n=0 

+  t0>.(X£,ao)  +  *o(*N,«o)ZA(JVAf«r0) 

or  the  centered  form 
A’-l 

QA(ao)  =  £  A[(<b(XnAao)  -  £°«Jt(XA,a0))ZA(nA,ao)  +  ka{X*,a o)] 

n=0 

+  *o.«(X$,a0)  +  (*<,(*£,  «o)  -  jtoi,a°)^(iVA,ao) 

are  appropriate  approximations  to  (1.5).  The  QA(ao)  will  have  the  smaller 
variance. 

In  fact  we  have 

E?Q*(ao)  =  EaAQ*(o, o)  =  ^  [e«  £  k(xA(s),  a)ds  +  k0(xA(T), a) 

and  we  have  the  weak  convergence 

(ZA(-,a0),gA(a0)>QA(Qo),=rA(-))  =>  (^(-, or0), <?(ar0), Q(ar0), *(-))- 

We  will  obtain  various  'infinite  time’  extensions  of  this  result  in  Section  7. 
Analogous  to  the  comment  below  (1.5  ),  to  reduce  computation  while  ex¬ 
ploiting  the  (variance  reduction)  advantages  of  the  centering,  in  the  simulations 
we  replace  E§°k(XA,  ao)  by  E°°k(XA,  Qo),  with  good  results  in  general. 
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6.  Stability  of  The  Approximation 


An  analog  of  Theorem  2.1  is  needed  for  the  {Aa}  process.  We  will  require 
the  following  additional  condition 

A6.1. 


(a)  V’(x)b(x, a) 

(b)  liminf  inf 
|i[— OP  dt€A0 


— *  —oo  as  |x| 
|6(r,a)P 


— *■  oo,  uniformly  for  a  6  Aq- 

>0. 


Theorem  6,1.  Assume  (A2.1)-(A2.3)  and  (A6.1).  There  is  a  compact  set 
Q  such  that  for  each  compact  Qx  3  Q,  we  have  for  small  p  >  0,  6  >  0,  and 
A  <6, 


(6.1)  sup  sup  £®expprA<oo, 

<*€^o  t€Qi~Q 

where  ta  =  min (t:xA(<)  £  Q}. 

Proof.  Let  A'a  =  x.  For  some  Kq  <  oo,  we  have 


A  =  E°  expp[V(Af )  -  V(x)j 

<  £“  expp[VJ(z)(t(z,a)A  +  <r(x)6u’)  +  K0(\bw\7  +  |6(x,a)|2A2)]. 
i  Tote  that  for  2HtA  <  1, 


E°  exp  fc|<5tx|2  <  1/(1  -  2r*A). 

Thus,  for  small  p,  A  and  fcj  >  0, 1/fcj  +  l/fc2  =  1,  Holder’s  inequality  yields 

zxp  p[V:(x)<r{x)6w  +  A'0|5it>|2] 

<  [expF?p?AKI'(x)a(x)14(x)/2J!/i>  •  -  -  1 

(1  -  2rp<r2A0A)1'*3 
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Thus,  for  small  p,  A  and  k i  fixed  near  unity, 

A  <  exp p[v'x{x)b{x ,q)A  +  /fo|6(z.°)l2^2 
+^PAV't{x)a{x)Vt{x)  +  4rtf0A], 

Thus,  there  is  a  compact  set  Q  and  t\  >  0  such  that  for  small  p  and  for  x  £  Q, 
A  <  exp  —  2p£i  A.  Thus  for  small  p  and  x  £  Q, 

E°  exp pAei  exp  pV(X^)  <  exppVXx). 

Hence 

E°  exppfi  rA  •  exppV(xA(rA))  <  exp  pV(x), 
which  yields  the  result,  as  in  Theorem  2.1.  Q.E.D. 


26 


7.  Ergodic  Properties  of 


We  now  set  up  the  machinery  so  that  results  analogous  to  those  in  Sections 
4  and  5  and  the  limits  as  A  — ♦  0  can  be  obtained.  Define  r,  G,  Ti  and  Gj  as  in 
Section  3.  Define  the  stopping  times: 

rA'  =  inf{<:*A(*)*G,l-rl}1 
rA  =  inf{<:zA(<)  £  G}, 
rA  =  inf{<  >  rA: zA(t)  g  Gi  -  Ti). 

For  n  >  1, 

Tn  =  inf{<  >  TAix:xA(i)  6  G) 
r?  =  inf {<  >  rA:xA(t)  £  Gx  -  Tj}. 

For  x  =  zA(0)  €  G,  we  use  rA  to  denote  -  TjA  =  rA,  the  canonical  return 
time  to  G. 

By  Theorem  6.1,  there  are  G,  G i  such  that  (e.g.,  let  G  equal  the  set  Q  of 
Theorem  6.1) 

(7.1)  sup  £“rA  <  oo,  sup  E°  expprA  <  oo, 

x€(?,a€,/4o  r6C»to€i4o 

for  small  p.  Define  XA  =  zA(rA).  For  o  £  A0,  the  process  {A'A,  n  >  0} 
is  a  homogeneous  positive  recurrent  Markov  chain  with  state  space  G.  Let 
PA(x,n,-  |  a)  denote  the  transition  function.  There  is  a  unique  invariant  mea¬ 
sure  /lA(a).  Analogously  to  the  situation  in  Section  3,  define  the  following: 

fT* 

ta(A)  =  /  /^(zA(s))ds,  A  =  Borel  set  in  Rr , 

Jo 

pA(A,a)  =  [  pA(dx,  a)E° rA(A) 

Jg 

PA(A,q)  =  p  A(A,a)/pA{Rr,a). 
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The  same  argument  used  to  show  that  fi(a)  is  invariant  for  *(■)  ([2],  p.  183)  can 
be  used  to  show  that  /rA(ar)  is  invariant  for  {.YA},  under  parameter  a.  We  can 
now  write  for  bounded  measurable  /: 


(7.2) 


fnfi*(dx,a)E;  f(x*(s))ds 
/G£A(<*z,a)£?rA 


Let  L°°(G)  denote  the  set  of  bounded  Borel  measurable  functions  on  G. 
Define  the  operator  PA(ar)  on  L°°(G)  by  PA(a)/(x)  =  £"/(JYA),  x  €  G. 


Lemma  7.1.  Assume  (A2.1)-(A2. 4)  and  (A6.1).  Then  the  set  {PA (a) Z,°°(G) 
( restricted  io  ||/||  <  1),  A  >  0,  a  €  Aq}  **  equiconiinuous. 


Remark  on  the  proof.  Define  the  process  yA()  to  be  the  piecewise  con¬ 
stant  interpolation  (intervals  [nA,nA  +  A))  of  the  process  defined  by  Y0A  =  x, 
ynA  !  =  ynA  +  <r(ynA)^tD(nA).  Then  yA(‘)  =*■  y(),  defined  in  Lemma  4.1.  Define 
the  Radon-Nikodym  derivative  exp£o'A(0,T),  where 

T/a-i 

C*(0,T)=  £  [(T-Hy^W^.a^'MnA) 

n= 0 

T/A-l 

-9  E  k-1(nf)6(VnA-o)i2A. 

n=0 

From  this  point  on,  the  proof  is  nearly  identical  to  that  of  Lemma  4.1  and  is 
omitted. 


Theorem  7.1.  Assume  (A2.1)-(A2.4)  and  (A6.1)  and  kl  a  —  q0-  Then 
XA  =>  Xt  if  Xq  =>  Xq,  and  /iA(ao)  =>  y(Qo)-  fa  addition  Ef°f(Xjf) 
P“°/(X*)  uniformly  in  x  €  G  and  in  /  in  any  equiconiinuous  set  with  ||/||  <  1. 
Also,  /iA(ao  +  ha)  =>  /iA(oo)  and  /rA(oo)  =>  y(ao).  Finally, 


/A  ao  =  OAao),/)  -  (Mao),/)  =  /°°, 
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uniformly  for  f  in  any  equiconiinuous  set  with  ||/||  <  1. 

Proof.  Note  that  (xA(),rA)  — ♦  (x(),r)  uniformly  in  x  6  G  in  the  sense 
that  E°° F(xA(-),  rA)  — *  E£°F(x(  ),t)  uniformly  in  x  £  G,  for  any  bounded 
and  continuous  real  valued  F().  The  weak  convergence  XA  =>  Xt  (if  XA  =► 
Xo)  follows  from  the  uniform  integrability  of  {rA,  A  >  0,  a  G  Ao}  and  the  (uni¬ 
form)  weak  convergence  of  zA()  to  x(-).  The  asserted  weak  convergence  can  be 
proved  by  a  standard  martingale  method  [5],  [6]  (and  using  the  non-degeneracy 
of  a(  )  and  the  smoothness  of  r,Ti  to  get  the  weak  convergence  of  rA).  In  fact, 
a  standard  weak  convergence  method  can  be  used  to  get  PA(a0)f  —*  P(o0)/> 
uniformly  in  /  in  any  equicontinuous  set  in  C(G). 

Now,  for  /  (E  C(G),  by  the  invariance  of  pA(ao),  we  can  write 

(A*(*o ),/)  =  (A  Vo),  PA(<*o)f)  = 

{pA(a0),  A  >  0}  is  obviously  tight  since  G  is  compact.  If  /i(a0)  is  the  limit  of 
a  weakly  convergent  subsequence,  then  by  the  last  expression,  we  have 

(A(a0),/)  =  {A(ao),P(a0)/),  /  €  C(G), 
which  yields  A(«o)  =  A (»o)- 

Now  use  (7.2),  the  weak  convergence  {rA,a:A()}  =>  {r, *(■)}  and  the  uni¬ 
form  integrability  of  {rA}  (Theorem  6.1)  and  jiA(ao)  =>  A(ao)  to  get  pA(a0)  => 
/r(ao).  The  last  assertion  of  the  theorem  is  also  proved  by  an  argument  by  con¬ 
tradiction  and  the  proof  is  omitted.  Q.E.D. 

An  analog  of  (3.6).  The  following  lemma  is  needed  to  get  an  analog  of 
Lemma  4.4. 

Lemma  7.2.  Assume  (A2.1)-(A2.3)  and  (A6.1).  Let  k  be  such  that  C7*  = 
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A  <  1  ( see  (3.5)).  Lei  G'(G)  C  C(G )  be  an  eqvicontinuous  set.  Then 


(7.3)  sup 

f€C'(G)  ||/ - /A'a°|| 

Equivalently,  there  are  V'i  <  1,  C\  <  oo,  suck  that  for  small  A  >  0, 


!|FA(«o)n/  -  <  CMf  -  f*a°  I 


Proof.  Suppose  that  (7.3)  is  false.  Then  there  is  xn  — *•  x  £  G,  An  — *■  0, 
A„  >  A0  >  A,  fn  €  C'(G),  fn  —*•  /  €  C'(G),  such  that 

||/n-/^"’O0||  " 

Without  loss  of  generality,  we  can  suppose  that  the  infima  of  the  denomenators 
are  positive.  Then  we  can  write 

\\fn  -  rf"aa\\  ~  ll/n-/*"'“0|| 


(7  5)  \E;afi*k)  -  EaM X?')\  1  r°  -  I 

ll/n-/nA"’O0|j  ||/n  -  /nA"'“0|| ’ 

The  last  two  terms  on  the  right  go  to  zero  by  the  weak  convergence  Aa"  =>  A’* 
(initial  conditions  =  xn,  Xo  =  x ,  resp.),  and  ft^n(ao)  =>  fi(a o),  and  the 
convergence  }„  — ►  /.  The  left  side  of  (7.5)  goes  to  |£’“0/(A'i)-/ao|/||/-/Q,°||  < 
Ctpk  =  A  and  we  have  a  contradiction. 

Inequality  (7.4)  follows  from  (7.3)  by  letting  rpk  =  (A  +  6A)  for  small  6X  >  0, 
and  iterating.  Q.E.D. 

Lemma  7.3.  Assume  (A2.1)-(A2.5)  and  (A6.1).  Then  for  f  £  L°°(G), 

(Pa(qq  +  ha)  —  Pa(qq)]/ 

6a 
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converges  (as  6a  —*  0)  to  the  function  P£(ao)f  with  values 

E^Z\T^,aQ)f(Xt)  =  £EZf(X?)  |o=O0. 

The  limit  is  continuous  and  the  convergence  is  uniform  in  A,i  £  G,  and  in 

f  G  C(G)  for  H/ll  <  1.  The  set  {E?°  Z*(r*  ,a0)f(X?),  A  >  0,  /  G  C(G), 

ll/ll  £  1}  «s  equicontinuous. 

The  same  result  holds  for  the  convergence 

1  rT&  fT& 

_  E*o +*«  J  f(x£k(s))ds  -  E°°  J  f(x*(s))ds 

-  /  /(*A(s))ZA(«,a0)<fc. 

./O 

The  proof  is  analogous  to  that  of  Lemma  4.2  but  uses  the  Radon-Nikodym 
derivative  introduced  in  the  remark  under  Lemma  7.1,  and  is  omitted. 

Theorem  7.2.  Assume  (A2.1)-(A2.5)  and  (A6.1).  Then  fi^(aD)  and 
Pa  (a o)  exist  in  the  sense  of  weak  convergence. 

Proof.  Let  C^(G)  be  the  subset  of  C(G)  for  which  (/,pA(o0))  =  0.  Fol¬ 
lowing  the  proof  of  Lemma  4.4  and  its  corollary,  we  first  show  the  invertability 
of  (I  -  PA(a0))  on  Cc(G),  on  which  we  identify  functions  which  are  equal  a.e. 
(pA(ao))-  By  Lemma  7.1  and  the  fact  that  pA(o o)  is  an  invariant  measure 
for  the  transition  function  which  defines  PA(a0)>  for  /  G  Cf(G)  the  sum  be¬ 
low  converges  and  we  have  (I  —  PA(ao))CA(G)  C  CA(G).  By  Lemma  7.2,  we 
obviously  have 

(/  -  PA(a0))  £(Pa(o0 ))"/  =  £(PA(a0))"(/  -  PA(a0))/  =  /. 

n=0  n=0 

These  facts  yield  that  the  inverse  is 

(7.6)  9*=  (I-  PA(o o))'1/  =  £(PA(a0))"/- 

n=0 
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By  Lemmas  7.1  and  7.2,  the  sum  on  the  right  side  converges  uniformly  in  A 
and  it  is  equicontinuous  for  /  6  C*(G),  |j/||  <  1,  A  >  0. 

We  can  now  use  a  proof  analogous  to  that  of  Theorem  4.1  (but  using 
weak  rather  than  setwise  convergence)  together  with  Lemma  7.3  and  the  weak 
convergence  fiA(ao  +  £<*)  =>  /iA(a o)  to  get  the  existence  of  Pa(oro)  in  the 
sense  of  weak  convergence,  and  the  few  details  are  omitted.  To  get  the  ex¬ 
istence  of  fia(ao)  in  the  sense  of  weak  convergence,  use  the  representation 
(7.2)  and  the  o-differentiability  of  /iA(o),  E°  f(xA(s))ds  at  a  =  qq,  and 
£'o°rA  q  The  details  are  like  those  of  Theorem  4.2,  but  uses  the  equicon- 
tinuity  of  {£"  JQT  f(x*(s))ds)  (in  /  €  C{G),  A  >  0,  ||/||  <  1,  »  G  Ao),  the 
weak  convergence,  and  the  uniform  integrability  of  {rA,  small  A  >  0,o  G  Ao}. 
Q.E.D. 

Corollary.  Assume  the  conditions  of  the  Theorem.  Then  p£(a0)  exists  in 
the  sense  of  setwise  convergence.  Also  (a0),  small  A  >  0}  js  of  bounded 
variation.  For  g  £  L°°(G),  there  is  a  unique  f  G  L°°(G)  suck  that 

(7-Pa(q0))/  =  g-~g*a° 


and  fA'a°  =  0. 

Proof.  Let  /  G  L°°(G).  Analogous  to  (4.9),  write  6P( a)  =  PA( q0  +  £a)  - 
PA(o0),  6ftA{a)  =  pA(a o  +  6a)  -  pA(a0),  and 


(7.7) 


< 


+  <  PA(oo), 


6^(a) 


6a 
6P*{a) 


./  >=< 


^  (Q)  6* 


6a 


,P*(ao)f> 


,  ,-A,  ,  6P*{a o)  , 

/  >  +  <  6^{a), — — -f  >  . 


6a  fa 

By  Lemma  7.3,  (6P^(a)/6a)f  converges  to  a  continuous  function,  uniformly 
in  x  G  G.  This  and  6ft^(a)  =>  zero  measure  implies  that  the  last  term  on  the 
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right  of  (7.7)  tends  to  zero,  as  6a  — *  0.  Furthermore,  since  p£(ao)  exists  in 
the  sense  of  weak  convergence  by  the  theorem.  The  second  term  on  the  right  of 

(7.7)  tends  to  <  £A(a0),  PA(ao)f  >. 

Since  PA(ao)f  is  continuous  (Lemma  7.1),  and  £A(ao)  exists  in  the  sense 
of  weak  convergence,  the  first  term  on  the  right  tends  to  <  ££(»o),  PA{®o)f  > 
Thus  the  limit  of  the  left  side  of  (7.7)  exists.  Now,  the  form  of  the  limit  of  the 
right  side  implies  that  pA(oo)  exists  in  the  sense  of  setwise  convergence. 
Rewrite  (7.7)  as 

<  (a0)i  (I  -  PA(ao))f  >  =  <  o),  PA{o‘o)f  >  ■ 

For  g  €  set  g  =  g  —  g^,a°  and  define 

(7.8)  fA  =  £(PA(a0))"s 

n=0 

=  9+E(PA(c*o))n(PA(*oW- 

n  =  0 

The  sum  converges  uniformly  in  g,  A,  for  ||  g  |j<  1,  since  {Pa(qo)?,  A  >  0,g  € 
L°°((-7),  Hsll  <  1}  is  equicontinuous  by  Lemma  7.1.  The  uniqueness  assertion 
follows. 

Thus 

(1  -  PA(a0))fA  =  g 

and 

<  Pa(Qo ),<?  >=<  VA{ao),g  >=<  frA(<*o),PA(ao)fA  >  ■ 

The  bounded  variation  assertion  follows  from  this  representation.  Q.E.D. 
The  convergence  Theorem  for  the  discretizations. 
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Theorem  7.3.  Assume  (A2.1)-(A2.5)  and  (A6.1).  Then  /i£( qo)  converges 
setwise  to  fta(ao)  and  /iA(oo)  converges  setwise  to  fia(oto). 

Proof.  Let  /  6  L°°(G).  Let  g*  and  g ,  resp.,  be  the  unique  solutions  in 
L°°(G)  (Theorem  7.2,  Lemma  4.4)  to 

(/  -  P*(a0))g*  =  /  -  /A’“° 

(L-P(a0))j  =  /-r». 

Note  that 

£(/iA(a),^)|a=Qo  =  (/iA(ao),i7A) 

=  (tf(a0),  PA(oo)5A)  +  <£A(ao),  PA(«o)ffA>. 

Then  we  can  write 

(Pa  (°o),  /)  =  (pA  (<*o),  /  —  /A’"°)  = 

(7-9)  =(/iA(«o),(/-PA(ao))(?A) 

=  /^(dx,a0)£oA(A?)U0. 

We  have 

^xV(A'A  )|o=O0  =  £r°^A(A’f)ZA(rA,ao). 

Now,  note  that  the  sum  in  (7.6)  converges  uniformly  in  A  (Lemma  7.2);  hence 
g A  — *  s,  since  PA(a0)”/  -*  P(Q o)n/-  Using  this,  the  weak  convergence  of 
{A'a, ZA(rA,Q0)},  the  uniform  integrability  of  {ZA(rA, Qo),  A  >  0},  the  fact 
that  the  functions  on  the  right  side  of  (7.9)  converge  uniformly  in  x  €  G  to 
the  continuous  limit,  and  the  fact  that  £A(a0)  =>  ^(»o)  yields  that  the  limit  as 
A  — *  0  of  the  right  side  of  (7.9)  is 

j  fi{dz,ao)E°0g(Xl)Z(T,ao)  =  J  fi(dx,Q0)~E^g(  A'i)|o=0o  = 
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=  (Pa(oo),(f  “  P{ao))9)  =  (£a(ao)>  /  ~  /°°)  —  (£a(a0 )./)■ 

Thus 

(fia(Qo)J)  — ’  (£a(ao),/) 

which  yields  the  setwise  convergence  of /2A(ao)  to  £a(°o)-  The  setwise  conver¬ 
gence  of  pA(ao)  to  /r(oo)  follows  from  the  representation  (7.2).  For  example 
to  get  the  limit  of  the  derivative  of  the  denominator  of  (7.2),  note  that  the 
derivative  of  the  denominator  is 

jf  tf(dx,a0)E2°T*  +  jf  tf(<**>«o)^E?T*\a=ao. 

Then  use  the  representation 

=  E?oT*Z*(r*,a0), 

and  the  proved  convergence  and  uniform  integrability  (where  appropriate)  re¬ 
sults  for  /iA(oo),  A'A(-),r*,ZA(rA,a0).  Q.E.D. 

A  finite  time  approximation  Theorem.  The  next  result  shows  that  the  deriva¬ 
tive  (n0(ao)J)  of  the  ergodic  cost  can  be  arbitrarily  well  approximated  by 
J-E° /(xA(t)) |a_Q  for  large  *  and  sma11  A  It  is  such  approximations  that  are 
actually  used  in  the  applications.  It  is  important  to  note  that  for  large  enough 
t,  the  quality  of  the  approximation  is  uniformly  good  in  (small)  A. 

Theorem  7.4.  Assume  (A2.1)-(A2.5),  (A6.1).  Then  for  f  €  LX(R  ), 

(Po(Qo ),/)  =  hnio(/iA(o0),/) 

(7.10)  =  lim  /  /i*(<hr,oo)^£“/(*A(0)L=ao. 

i -Too  J  i“  > 

when  the  limits  as  A  —  0,  <  —  oo  can  be  taken  in  any  way  at  all. 


^  r'o  A 

diE-r 
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Proof.  Write,  by  the  invariance  of  /iA(a)  and  the  differentiability: 

/>*=<*  0  =  (^(ao),P*(ao,t)/) 

(7-11)  +(/iA(o0),PaVo,0/), 

where  P^(ao,t)f(x)  =  £“°  /(zA(t))  and  t  >  0.  We  have  PA(a0,<)/  ~ *  /°°  = 

<  /i(ao),/  >  as  A  — ►  0,  <  — »  oo.  Also,  {^A(oo),  A  >  0}  is  of  bounded  variation 
by  the  corollary  to  Theorem  7.2.  Thus  (ti£(ao),  PA(ao,<)/)  -*  0  as  A  -*  0  and 
t  — ♦  oo,  which  yields  the  theorem.  Q.E.D. 

A  pathwisc  result.  With  the  approximation  of  Theorem  7.4  in  hand,  we  can 
give  the  pathwise  result.  Since  we  only  have  one  long  realization  and  cannot 
explicitly  calculate  the  derivatives  of  the  expectations,  we  need  to  show  that  a 
long  simulation  of  {A'A,  n  <  oo}  can  yield  a  good  approximation  to  the  right 
side  of  (7.10)  for  fixed  A.  Typically,  the  <o  in  Theorem  7.5  is  as  large  as  can  be, 
consistent  with  a  modest  sample  variance. 

Theorem  7.5.  Assume  (A2.1)-(A2.5)  and  (A6.1).  Fix  t  =  nA.  Let  /(•)  be 
bounded  and  continuous.  Then  as  T  — ►  oo  (or  with  centered  f  used  as  discussed 
in  Section  5) 

J  ^^('o  +  .,*o)  ~  ^,«o)]/(^o  +  s))ds 

(7.12)  -  J p*(dx,a0)E?°Z*(<o,a0)f(x*(t0)) 

=  (^(«o),^£Q/(xA«o))|0=ao). 

Proof.  Fixt0-  Define  6ZA(f0,  s)  =  ZA(fo+s>  a0)-£A(s.  <*o)  and  yA(f0,  s)  = 
6ZA(<oi «)/(*A(<o  +  *))•  Then  the  process  (parameter  T )  defined  by 

M*(T)  =  £[YA(t0,s)  -  0(f)rA(«o,«)]* 
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is  a  zero  mean  martingale  whose  variance  is  0(T).  Thus  Kronecker’s  Lemma  im¬ 
plies  that  Ma(T)/T  — »  0  w.p.l.  This  implies  that  for  the  purpose  of  evaluating 
the  limit  of  the  left  side  of  (7.12),  we  can  replace  it  by 

(7.13)  ±j\x*(s))ds, 

where  we  define 

q(XA(s))  =  £;i(j)KA(to,  S)  =  +  s))|a=oo. 

The  function  q(  )  is  continuous  and  bounded.  Then,  the  ergodic  properties  of 
{A'^,  n  <  oo)  imply  that  (as  T  — *  oo)  (7.13)  converges  w.p.l.  to  its  mean  value 
(/iA(a 0),q)  which  is  just  the  center  term  of  (7.12).  Q.E.D. 
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8.  Numerical  Comparisons. 


The  approximation  method  of  Section  7  has  been  simulated  and  compared 
with  alternative  methods  on  a  variety  of  problems  of  dimension  up  to  seven. 
Here,  we  comment  on  some  comparisons  with  a  finite  difference  method.  The 
alternative  methods  are  all  described  and  discussed  in  [1],  and  we  will  repeat 
only  a  few  of  the  comments  made  there. 

The  basic  method  used  for  all  methods  takes  one  long  simulation,  over  an 
interval  T\.  A  basic  estimation  interval  To  is  given,  and  the  approximate  model 
AA(  )  is  simulated.  N  =  Ti/To  estimates  of  the  derivative  are  made  in  the  long 
simulation  interval,  each  using  To  units  of  time.  Let  X„  denote  the  state  of 
the  system  at  the  start  of  the  n‘hsubinterval.  Then  is  the  initial  condition 
for  the  estimate  on  the  (n  f  1)**  subinterval.  The  detailed  results  reported  here 
are  for  a  two  dimensional  problem,  with  the  parameter  a  being  a  scalar.  We 
comment  on  larger  problems  later.  For  the  finite  difference  estimate,  a  pair  of 
simulations  must  be  taken,  with  a  parameter  set  at  ao  ±  6a,  for  some  small  6a. 
The  samples  of  the  6w  in  (5.1)  for  the  second  member  of  the  pair  was  the  same 
as  that  of  the  first  member  of  the  pair,  with  the  samples  being  independent  from 
pair  to  pair.  This  reduced  the  variance  over  what  would  have  been  the  case  if 
all  the  samples  of  the  6 w  random  variables  has  been  mutually  independent,  as 
in  [1].  The  reduction  was  particularly  large  if  the  system  was  linear,  and  the 
cost  function  smooth,  although  there  was  a  noticeable  reduction  in  the  variance 
in  all  cases  tested. 
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The  two  dimensional  problem  was  the  noise  driven  Van  der  Pol  equation 


dx  i  =  *2  dt 


dx 2  =  [10x2(1  —  *i)  -  QX\]dt  +  dw, 


where  Qo  =  2.  Note  that  this  system  is  degenerate.  Nevertheless  the  method 
works  well.  The  cost  function  of  interest  was 

lc(x(s))ds/S 


for  large  5,  where 


Kx)  =  J{|:r3|>0.3}- 


The  simplest  estimator  is 

1^1  rnTo+To 

(8-i)  /  [ZA(S,ao)-Z*(nT0,ao)}k(X*(s))ds. 

"  n  =  l  10  JnT° 

An  “antithetic”  variable  method  was  always  used  since  it  gives  a  reduced  vari¬ 
ance:  Let  N  be  an  even  number,  and  let  the  Sw  samples  used  for  the  2n’th 
estimate  be  the  negative  of  that  used  for  the  2n  —  l’th  (n  =  1,2,  •  •  •)  estimate, 
with  the  Sw  used  for  the  2n-  l’th  estimates  (n  =  1,2,  ■  ■  • ,  AT/2)  being  mutually 
independent. 

The  centered  form,  where  k(X^(s))  is  replaced  by  the  centered  k(XA(s))  - 
k(nTo  +  7o),  where  the  centering  is  a  sample  estimate  of  the  value  of  the  cost 
at  the  cited  time,  actually  gave  better  results.  This  method  is  referred  to  as  the 
AC-method  in  the  tables  below  (antithetic  variable,  centered).  The  centering  is 
zero  mean,  but  helps  reduce  the  variance.  As  n  — *  00,  (8.1)  converges  to 

£  J Vt(dx,Qo)E2°k(X A (To)). 
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For  large  enough  To ,  this  is  a  good  estimate  of  the  desired  derivative.  A  better 
procedure  would  be  to  divide  the  interval  [0,To]  into  a  reasonable  number  of 
subintervals  to  get  a  better  approximation  to  the  first  centered  form  discussed 
in  Section  5.  But  one  must  keep  in  mind  that  the  CPU  time  required  for  a  large 
number  of  subdivisions  might  be  better  used  for  taking  more  samples. 

A  third  method,  called  the  weighted  AC-method,  often  (but  not  always)  was 
advantageous.  As  s  — ►  oo,  the  variance  of  [ZA(s,or0)  —  ZA(nTo,c*o)]  goes  to 
oo.  If  the  system  has  a  “short”  memory,  then  the  “earlier”  part  of  the  ZA() 
process  contributes  little  to  the  estimate  in  the  following  sense:  Let  nTo  +  To  > 
s  >  so  >  nTo,  and  write 

[2TA(s, Q0)  -  ZA(nT0,ao)]k(XA(s))  = 

[Z*(s,ao)-Z*{so,ao)]k(X*(s))+ 

[ZA(s0,a0)  -  ZA(nT0,a0)]k(XA(s)). 

Then  the  mean  value  of  the  second  term  goes  to  zero  as  s  —  so  — ♦  oo.  But, 
if  we  reduce  the  sample  interval,  then  a  bias  is  added.  In  order  to  balance 
the  opposing  effects,  we  use  a  weighted  substitute  ZA  for  ZA,  constructed  as 
follows,  where  A  £  (0, 1)  is  a  weighing  factor  or  exponential  discount  of  the  past: 
(notation  for  the  non-degenerate  case); 

ZA{(n  +  1)A)  =  [<j~  1  (XA , o0)ba (X? , a0)]'M«'A)  -  AZa(uA). 

For  the  problem  reported  on  here,  this  method  gave  excellent  results.  In  other 
cases,  where  the  “approach  to  ergodicity”  is  slower,  a  substantial  bias  could  be 
introduced  into  the  estimates. 

Refer  to  the  tables,  where  the  sample  means  of  the  derivative  estimates, 
their  sample  standard  deviations,  and  the  required  CPU  time  are  given.  For 
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the  finite  difference  estimates,  ^=2,500  was  used,  and  N=5,000  otherwise.  This 
is  because  two  system  simulations  per  finite  difference  estimate  are  needed,  and 
only  one  for  our  method.  But  the  important  quantity  is  the  sample  standard 
deviation  per  CPU  time  unit.  Note  that  the  sample  standard  deviation  for  the 
weighted  AC-method  decreases  as  To  increases,  while  that  for  the  AC  method 
increases.  We  can  readily  see  the  advantages  of  the  methods  introduced  here. 
For  linear  systems,  the  finite  difference  method  seems  to  work  better  owing  to 
the  ‘smoothness’  of  the  dependence  of  the  estimates  on  the  noise,  and  the  value 
of  the  difference  interval  was  not  too  important  (did  not  seriously  affect  the 
sample  variance),  as  long  as  it  was  small  enough  to  control  the  bias. 

There  are  important  dimensionality  advantages  to  our  methods.  Suppose 
that  the  dimension  of  the  parameter  is  m.  Then,  in  order  to  get  a  single  estimate 
of  a  gradient,  a  finite  difference  method  needs  to  simulate  the  system  either 
(m  +  1)  or  2m  times,  depending  on  the  finite  difference  method  used  (one  sided 
or  central).  Our  method  requires  the  simulation  of  only  one  sample  path  per 
estimate,  and  the  calculation  of  one  Z  -variable  per  component  of  the  parameter. 
But,  the  calculation  of  the  ^-variable  is  usually  much  simpler  than  doing  a 
simulation  of  the  system.  This  is  particularly  true  if  the  system  is  of  high 
dimension,  or  if  the  dynamical  terms  are  hard  to  compute.  Thus,  our  methods 
do  require  much  less  computer  time  than  does  the  finite  difference  method, 
particularly  for  high  dimensional  and  nonlinear  problems.  Alternative  methods, 
such  as  the  finite  difference  method,  can  compensate  for  this  only  by  having  a 
better  quality  estimate;  i.e.,  one  with  smaller  bias  or  sample  variance. 

We  emphasize  that  no  general  rule  has  been  found  which  can  tell  us  which 
method  would  be  preferable  for  any  particular  class  of  problems.  All  methods 
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must  be  taken  as  serious  candidates,  and  techniques  sought  for  their  realization 
so  that  they  perform  as  well  as  possible. 


To  =  3 


Finite  Difference  (6a  =  .05) 


sample  mean 
derivative  .168 

cost  .363 

CPU  Time 


sample  standard  deviation 
.247 
.149 

32.04 


AC 


sample  mean 
derivative  .164 

cost  .364 

CPU  Time 


sample  standard  deviation 
.216 
.127 

18.9 


Weighted  AC  (Derivative  only) 


sample  mean 
A 

.1  .160 

.5  .153 

CPU  Time 


sample  standard  deviation 

.19 

.14 

20.1 


TABLE  1 
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To  =  10 


Finite  Difference  (6a  =  .05) 


sample  mean 
derivative  .157 

cost  .364 

CPU  Time 


sample  standard  deviation 
.243 
.052 

104.8 


AC 


sample  mean 
derivative  .162 

cost  .364 

CPU  Time 


sample  standard  deviation 
.304 
.032 

65.5 


Weighted  AC  (Derivative  only) 


sample  mean 
A  =  .5  .157 

A  =  1  .150 

CPU  Time 


sample  standard  deviation 
.106 
.07 

68.3 


TABLE  2 
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To  =  20 


Finite  Difference  (6a  =  .05) 


sample  mean 
derivative  .168 

cost  .365 

CPU  Time 


sample  standard  deviation 
.246 
.032 

209.8 


AC 


sample  mean 
derivative  .168 

cost  .365 

CPU  Time 


sample  standard  deviation 
.537 
.021 

65.5 


Weighted  AC  (Derivative  only) 


sample  mean 
A  =  .5  .154 

A  =  1  .163 

CPU  Time 


sample  standard  deviation 
.058 
.101 
137.05 


TABLE  3 
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